AITopics | final accuracy

Collaborating Authors

final accuracy

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

S-GAI: Spectral Geometry-Aware Initialization for Sigmoidal MLPs -- From Dataset Geometry to Network Weights

Chu, Yi-Shan

arXiv.org Machine LearningJun-30-2026

Classical universal approximation theorems establish the expressive power of sigmoidal multilayer perceptrons, but they do not prescribe how initial weights should encode the geometry of a data distribution. We propose S-GAI, a spectral geometry-aware initialization framework for one-hidden-layer sigmoidal MLPs. Starting from the constructive idea that sigmoid units can act as smooth half-space gates, we move from hand-specified planar geometry to class-wise spectral geometry estimated from image data. For each class, SVD provides a mean, principal directions, and spectral scales. An energy threshold selects the retained directions, and each retained direction is represented by two sigmoid gates. These class-specific gates form a shared hidden layer initialized directly from the training set. We also formulate a SVD-based subspace classifier as a non-neural geometric reference, which tests whether the estimated spectral class geometry is already discriminative before being embedded into the MLP. Experiments on MNIST, Fashion-MNIST, and a more challenging CIFAR-10 test show that the S-GAI-initialized MLP starts from a substantially more informative hidden state than Xavier initialization and reaches comparable final accuracy under full training. When the hidden layer is frozen, training only the output layer still gives stronger performance than frozen random gates, providing evidence that S-GAI effectively embeds class-wise spectral geometry into the MLP.

artificial intelligence, geometry, machine learning, (18 more...)

arXiv.org Machine Learning

2606.28444

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.68)

Add feedback

Towards Accelerated Model Training via Bayesian Data Selection

Neural Information Processing SystemsApr-25-2026, 12:23:38 GMT

Mislabeled, duplicated, or biased data in real-world scenarios can lead to prolonged training and even hinder model convergence. Traditional solutions prioritizing easy or hard samples lack the flexibility to handle such a variety simultaneously. Recent work has proposed a more reasonable data selection principle by examining the data's impact on the model's generalization loss. However, its practical adoption relies on less principled approximations and additional holdout data. This work solves these problems by leveraging a lightweight Bayesian treatment and incorporating off-the-shelf zero-shot predictors built on large-scale pre-trained models. The resulting algorithm is efficient and easy to implement. We perform extensive empirical studies on challenging benchmarks with considerable data noise and imbalance in the online batch selection scenario, and observe superior training efficiency over competitive baselines. Notably, on the challenging WebVision benchmark, our method can achieve similar predictive performance with significantly fewer training iterations than leading data selection methods.

machine learning, natural language, zero-shot predictor, (18 more...)

Neural Information Processing Systems

Country: Asia > China (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
(2 more...)

Add feedback

Towards Accelerated Model Training via Bayesian Data Selection Zhijie Deng

Neural Information Processing SystemsFeb-8-2026, 12:35:39 GMT

Traditional solutions prioritizing easy or hard samples lack the flexibility to handle such a variety simultaneously. Recent work has proposed a more reasonable data selection principle by examining the data's impact on the model's generalization loss.

machine learning, natural language, zero-shot predictor, (18 more...)

Neural Information Processing Systems

Country:

Asia > China > Shanghai > Shanghai (0.04)
North America > United States > California (0.04)
Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.04)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
(2 more...)

Add feedback

SuRe: Surprise-Driven Prioritised Replay for Continual LLM Learning

Hazard, Hugo, Fountas, Zafeirios, Benfeghoul, Martin A., Oomerjee, Adnan, Wang, Jun, Bou-Ammar, Haitham

arXiv.org Artificial IntelligenceDec-1-2025

Continual learning, one's ability to adapt to a sequence of tasks without forgetting previously acquired knowledge, remains a major challenge in machine learning and a key gap between artificial and human intelligence. While regularisation and replay perform well in vision, they lag behind multi-task learning for large language models (LLMs), especially at scale with many tasks. We revisit replay and argue that two failure modes drive this gap: selection (what to rehearse) and integration (how to consolidate new knowledge). To address selection, we propose Surprise-prioritised Replay (SuRe), a simple, architecture-agnostic rule that ranks and stores the most surprising (high Negative Log-Likelihood) sequences. SuRe achieves state-of-the-art performance in the Large Number of Tasks (LNT) setting and delivers the best overall average across both Standard CL and LNT benchmarks. To address integration, we add a dual-learner design with fast and slow LoRA adapters merged via an exponential moving average (EMA), enabling rapid adaptation while stabilising long-term knowledge. Combining SuRe with the dual learner yields further gains, including improvements of up to +5 accuracy points on LNT over prior SOTA. Ablation studies confirm that our proposed method remains robust under reduced replay frequency and small buffer size, demonstrating both effectiveness and sample efficiency. Taken together, our results establish replay as a strong baseline for continual LLM fine-tuning and demonstrate that surprise-based selection and slow-weight consolidation are complementary components for mitigating catastrophic forgetting.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2511.22367

Genre: Research Report > New Finding (1.00)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Mitigating Catastrophic Forgetting in Streaming Generative and Predictive Learning via Stateful Replay

Du, Wenzhang

arXiv.org Machine LearningNov-25-2025

Many deployed learning systems must update models on streaming data under memory constraints. The default strategy, sequential fine-tuning on each new phase, is architecture-agnostic but often suffers catastrophic forgetting when later phases correspond to different sub-populations or tasks. Replay with a finite buffer is a simple alternative, yet its behaviour across generative and predictive objectives is not well understood. We present a unified study of stateful replay for streaming autoencoding, time series forecasting, and classification. We view both sequential fine-tuning and replay as stochastic gradient methods for an ideal joint objective, and use a gradient alignment analysis to show when mixing current and historical samples should reduce forgetting. We then evaluate a single replay mechanism on six streaming scenarios built from Rotated MNIST, ElectricityLoadDiagrams 2011-2014, and Airlines delay data, using matched training budgets and three seeds. On heterogeneous multi task streams, replay reduces average forgetting by a factor of two to three, while on benign time based streams both methods perform similarly. These results position stateful replay as a strong and simple baseline for continual learning in streaming environments.

classification, gradient, replay, (11 more...)

arXiv.org Machine Learning

2511.17936

Country:

North America > United States > California > San Diego County > San Diego (0.04)
North America > United States > California > Orange County > Irvine (0.04)
Asia > Thailand > Bangkok > Bangkok (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Add feedback

The Translation Barrier Hypothesis: Multilingual Generation with Large Language Models Suffers from Implicit Translation Failure

Bafna, Niyati, Li, Tianjian, Murray, Kenton, Mortensen, David R., Yarowsky, David, Sirin, Hale, Khashabi, Daniel

arXiv.org Artificial IntelligenceOct-22-2025

Multilingual generation with large language models (LLMs) is often of poor quality for mid- to low-resource languages, but the causes for this are not well-understood. We first demonstrate the existence of an implicit task-solving-->translation pipeline for generation, whereby the model first solves the required task in a largely target-language-agnostic manner, and subsequently translates answer concepts into the intended target language. We hypothesize that the failure of the translation stage, despite task-solving success, is an important culprit for the observed low quality of final outputs, and formalize this as the translation barrier hypothesis. We quantify the extent to which either stage in the pipeline is responsible for final failure for a word translation task across 108 language pairs, and find that the translation barrier explains a dominant portion of error for a majority of language pairs, and is especially severe for low-resource target languages. Our results highlight an important bottleneck for end-to-end multilingual generation, relevant for future work seeking to improve multilinguality in LLMs.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2506.22724

Country:

North America > United States (0.28)
North America > Mexico > Mexico City (0.14)
Asia > Middle East > UAE (0.14)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)

Add feedback

Local Timescale Gates for Timescale-Robust Continual Spiking Neural Networks

Tiwari, Ansh, Chauhan, Ayush

arXiv.org Artificial IntelligenceOct-16-2025

Spiking neural networks (SNNs) promise energy-efficient artificial intelligence on neuromorphic hardware but struggle with tasks requiring both fast adaptation and long-term memory, especially in continual learning. We propose Local Timescale Gating (LT-Gate), a neuron model that combines dual time-constant dynamics with an adaptive gating mechanism. Each spiking neuron tracks information on a fast and a slow timescale in parallel, and a learned gate locally adjusts their influence. This design enables individual neurons to preserve slow contextual information while responding to fast signals, addressing the stability-plasticity dilemma. We further introduce a variance-tracking regularization that stabilizes firing activity, inspired by biological homeostasis. Empirically, LT-Gate yields significantly improved accuracy and retention in sequential learning tasks: on a challenging temporal classification benchmark it achieves about 51 percent final accuracy, compared to about 46 percent for a recent Hebbian continual-learning baseline and lower for prior SNN methods. Unlike approaches that require external replay or expensive orthogonalizations, LT-Gate operates with local updates and is fully compatible with neuromorphic hardware. In particular, it leverages features of Intel's Loihi chip (multiple synaptic traces with different decay rates) for on-chip learning. Our results demonstrate that multi-timescale gating can substantially enhance continual learning in SNNs, narrowing the gap between spiking and conventional deep networks on lifelong-learning tasks.

artificial intelligence, machine learning, neuron, (16 more...)

arXiv.org Artificial Intelligence

2510.12843

Genre: Research Report > New Finding (1.00)

Industry: Education > Educational Setting > Continuing Education (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

uses the final accuracy of the SGD as a sanity check for the quality of models trained with AutoAssist (e.g.g, BLEU

Neural Information Processing SystemsOct-3-2025, 07:27:52 GMT

We thank the reviewers for their comments. We will carefully modify the paper according to the suggestions.Figure 1: Comparison of different learning schemes on RotMNIST classification and IWSL T translation tasks. For the NMT tasks, we used the same parameter settings from previous papers, as described in section 5.2. Assistant model shows similar performance over different batch sizes. However, we will provide results on raw ImageNet dataset and large Transformer model in the revised version.

accuracy, autoassist, sanity check, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.76)

Add feedback

Task-Focused Consolidation with Spaced Recall: Making Neural Networks Learn like College Students

Bamnodkar, Prital

arXiv.org Artificial IntelligenceSep-16-2025

Deep neural networks often suffer from a critical limitation known as catastrophic forgetting, where performance on past tasks degrades after learning new ones. This paper introduces a novel continual learning approach inspired by human learning strategies like Active Recall, Deliberate Practice, and Spaced Repetition, named Task-Focused Consolidation with Spaced Recall (TFC-SR). TFC-SR enhances the standard experience replay framework with a mechanism we term the Active Recall Probe. It is a periodic, task-aware evaluation of the model's memory that stabilizes the representations of past knowledge. We test TFC-SR on the Split MNIST and the Split CIFAR-100 benchmarks against leading regularization-based and replay-based baselines. Our results show that TFC-SR performs significantly better than these methods. For instance, on the Split CIFAR-100, it achieves a final accuracy of 13.17% compared to Standard Experience Replay's 7.40%. We demonstrate that this advantage comes from the stabilizing effect of the probe itself, and not from the difference in replay volume. Additionally, we analyze the trade-off between memory size and performance and show that while TFC-SR performs better in memory-constrained environments, higher replay volume is still more effective when available memory is abundant. We conclude that TFC-SR is a robust and efficient approach, highlighting the importance of integrating active memory retrieval mechanisms into continual learning systems.

artificial intelligence, machine learning, tfc-sr, (16 more...)

arXiv.org Artificial Intelligence

2507.21109

Country: North America > United States (0.46)

Genre: Research Report > New Finding (1.00)

Industry: Education > Educational Setting > Higher Education (0.76)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

Teaching AI to Remember: Insights from Brain-Inspired Replay in Continual Learning

Kim, Jina

arXiv.org Artificial IntelligenceSep-3-2025

Despite significant advancements in deep learning, artificial neural networks (ANNs) still suffer from catastrophic forgetting in continual learning, where training on new tasks causes them to easily forget previously learned information. In contrast, the human brain retains diverse information through declarative and nondeclarative memory systems ([Bear et al., 2020, Figure 24.1, p. 838]), storing it in either short-term or long-term memory. A key factor that protects humans from drastic forgetting is thought to be the reactivation of neural activity patterns representing previous experiences--referred to as memory replay (Wilson and McNaughton [1994], Rasch and Born [2007], Oudiette and Paller [2013], Van de Ven et al. [2016]). To address catastrophic forgetting in ANNs, previous works have attempted to mimic the brain's memory replay mechanism. Notably, studies such as Van de Ven et al. [2020], Millichamp and Chen [2021], Ran et al. [2024] have demonstrated that brain-inspired mechanisms can help retain performance during continual learning in AI. Motivated by these findings, we aim to draw inspiration from the brain to develop mechanisms for long-term memory in AI. Specifically, we focus on analyzing the impact of brain-inspired components on AI performance and providing insights to guide future research directions. 1

artificial intelligence, machine learning, replay, (16 more...)

arXiv.org Artificial Intelligence

2509.00047

Genre: Research Report > New Finding (0.47)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)

Add feedback